Tree-based Fitted Q-iteration for Multi-Objective Markov Decision Processes in Water Resources Management

نویسندگان

  • F. Pianosi
  • A. Castelletti
  • M. Restelli
چکیده

Multi–Objective Markov Decision Processes (MOMDPs) provide an effective modeling framework for multi–objective decision-making problems involving water resources systems. The traditional approach to solve these problems is to consider many single–objective problems (resulting from different combinations of the original problem objectives), each one solved using standard optimization techniques. This paper presents a new approach to MOMDPs based on batch–mode reinforcement– learning (RL) that enables to learn the operating policies for all the linear combinations of weights assigned to the objectives in a single training process. The key idea is to enlarge the continuous approximation of the action–value function, which is performed by single–objective RL algorithms over the state–action space, also to the weight space. The batch–mode nature of the algorithm makes it possible to enrich the training data without further interaction with the controlled system. The approach is first demonstrated by application to a numerical Test case study of a two–objective reservoir and then evaluated on a real world case study concerning the optimal operation of the Hoa Binh water reservoir in Vietnam. Experimental results on the Test case show that the proposed approach (named MOFQI) becomes com-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Objective Markov Decision Processes for Data-Driven Decision Support

We present new methodology based on Multi-Objective Markov Decision Processes for developing sequential decision support systems from data. Our approach uses sequential decision-making data to provide support that is useful to many different decision-makers, each with different, potentially time-varying preference. To accomplish this, we develop an extension of fitted-Q iteration for multiple o...

متن کامل

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

This paper presents a new method to learn online policies in continuous state, continuous action, model-free Markov decision processes, with two properties that are crucial for practical applications. First, the policies are implementable with a very low computational cost: once the policy is computed, the action corresponding to a given state is obtained in logarithmic time with respect to the...

متن کامل

Optimizing Spoken Dialogue Management from Data Corpora with Fitted Value Iteration

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...

متن کامل

Optimizing spoken dialogue management with fitted value iteration

In recent years machine learning approaches have been proposed for dialogue management optimization in spoken dialogue systems. It is customary to cast the dialogue management problem into a Markov Decision Process (MDP) and to find the associated optimal policy using Reinforcement Learning (RL) algorithms. Yet, the dialogue state space is usually very large (even infinite) and standard RL algo...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012